{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 12.2 概率PCA"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center>\n",
    "    <img src=\"../pic/12_09.png\">\n",
    "</center>\n",
    "\n",
    "⼀个观测数据点x的⽣\n",
    "成⽅式为：⾸先从潜在变量的先验分布$p(z)$中抽取⼀个潜在变量的值$\\hat z$，然后从⼀个各向同性的⾼斯分布\n",
    "（⽤红⾊圆圈表⽰）中抽取⼀个x的值，这个各向同性的⾼斯分布的均值为$w\\hat z+\\mu$，协⽅差为$\\sigma^2I$。绿⾊\n",
    "椭圆画出了边缘概率分布$p(x)$的密度轮廓线\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "基于线性高斯框架\n",
    "$$p(z)=\\mathcal N(z|0,I)\\\\\n",
    "p(x|z)=\\mathcal N(x|Wz+\\mu,\\sigma^2I)$$\n",
    "\n",
    "⽤最⼤似然的⽅式确定参数$W,\\mu,\\sigma^2$的值\n",
    "边缘概率:\n",
    "$$p(x)=\\mathcal(x|\\mu,C)\\\\\n",
    "C=WW^T+\\sigma^2I\\\\$$\n",
    "也可用加入噪声$\\epsilon\\sim\\mathcal N(\\epsilon|0,\\sigma^2I)$的观测变量推导出:\n",
    "$$\\mathbb E[x]=\\mathbb[Wz+\\mu+\\epsilon]=\\mu$$\n",
    "\n",
    "$$\\begin{align}cov[x]&=\\mathbb E[(Wz+\\epsilon)(Wz+\\epsilon)^T]\\\\\n",
    "&=\\mathbb E[Wzz^TW^T]+\\mathbb E[\\epsilon\\epsilon^T]\\\\\n",
    "&=WW^T+\\sigma^2I\\end{align}$$\n",
    "后验分布为:\n",
    "$$p(z|x)=\\mathcal N(z|M^{-1}W^T(x-\\mu),\\sigma^2IM^{-1})$$\n",
    "MxM的矩阵M为:\n",
    "$$M=W^TW+\\sigma^2I$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 12.2.1 最大似然PCA\n",
    "似然函数为:\n",
    "$$\\begin{align}lnp(X|\\mu,W,\\sigma^2)&=\\sum_{n=1}^Nln(x_n|W,\\mu,\\sigma^2)\\\\\n",
    "&=-\\frac{ND}{2}ln(2\\pi)-\\frac{N}{2}ln|C|-\\frac{1}{2}\\sum_{n=1}^N(x_n-\\mu)^TC^{-1}(x_n-\\mu)\\end{align}$$\n",
    "求关于$\\mu$的驻点，得到$\\mu=\\overline x$,再次代入到似然函数表达式:\n",
    "$$lnp(X|\\mu,W,\\sigma^2)=-\\frac{ND}{2}ln(2\\pi)-\\frac{N}{2}ln|C|-\\frac{N}{2}Tr(C^{-1}S)$$\n",
    "直接给出$W_{ML}$的表达式:\n",
    "$$W_{ML}=U_{M}(L_M-\\sigma^2I)^{\\frac{1}{2}}R$$\n",
    "$U_{M}$是一个$D\\times M$的矩阵，列由任意M个协方差矩阵S的特征向量组成，$L_M$就是对应的特征值$\\lambda_i$组成的$M\\times M$对角矩阵，R为$M\\times M$正交矩阵\n",
    "\n",
    "\n",
    "当M个特征向量被选为前M个最⼤的特征值所\n",
    "对应的特征向量时，对数似然函数可以达到最⼤值，其他所有的解都是鞍点。\n",
    "\n",
    "若特征向量按照特征值的大小降序排列,$\\sigma^2_{ML}$是与丢弃的维度相关联的平均⽅差:\n",
    "$$\\sigma^2_{ML}=\\frac{1}{D_M}\\sum_{i=M+1}^D\\lambda_i$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "R是正交的，将其视为M维潜在空间中的⼀个旋转矩阵，对于$R=I$时，$W$的列是主成分特征向量，由方差参数的平方根$\\sqrt{\\lambda_i-\\sigma^2}$缩放得到。因此，在特征向量ui⽅向上的⽅差λi由两部分相加得到，⼀部分来⾃于从单位⽅差潜在空间分布通过对应的W 的列向数据空间投影的贡献$\\lambda_i-\\sigma^2$，另⼀部分来⾃于在噪声模型的所有⽅向上相加的各项同性的⽅差的贡献$\\sigma^2$。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 概率PCA与传统PCA的关系\n",
    "传统的PCA通常的形式是D维空间的数据点在M维线性⼦空间上的投影。然⽽，概率PCA可\n",
    "以很⾃然地表⽰为从潜在空间到数据空间的映射($x=WZ+\\mu+\\epsilon$)\n",
    "\n",
    "使用贝叶斯定理对这个映射取逆，则数据点$x$可由潜在空间中的后验均值和⽅差进⾏概括:\n",
    "$$\\mathbb E[z|x]=M^{-1}W^T_{ML}(x-\\overline x)$$\n",
    "\n",
    "它到数据空间的一个点的投影为\n",
    "$$W\\mathbb E[z|x]+\\mu$$\n",
    "如果我们取极限$\\sigma^2\\rightarrow 0$，那么后验均值为:\n",
    "$$(W^T_{ML}W_{ML})^{-1}W^T_{ML}(x-\\overline x)$$\n",
    "表⽰数据点在潜在空间上的正交投影，因此我们就恢复出了标准的PCA模型"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
